-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvnemesis: disable SQL ops in safety mode #160647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, there is a single kvnemesis operation that executes via SQL: `ToggleGlobalReads`. We have seen this operation get stuck and cause the test to timeout under safety more. The expected behavior is that any stuck operations would time out, but it seems like there is a context cancelation propagation issue, most likely in lib/pq, but not confirmed. This commit disables `ToggleGlobalReads` in safety mode to reduce test failures. This change would also help confirm that this is the only operation susceptible to the hanging behavior. Fixes: cockroachdb#160293 Release note: None
|
CI is failing consistently for this test, in different ways. I'll take a look before it shows up in test triage. |
|
The most recent CI failure is an instance of #160653. An We're trying to validate this outcome as a single atomic operation with a single timestamp, but that's clearly not the case here. I think we should still merge this, as it addresses a different issue. |
|
bors r=stevendanna |
158639: storepool: consider StoreLiveness status when determining suspect stores r=miraradeva a=arulajmani First 4 commits from #158629; only the last one is relevant for this PR. Feel free to hold off on reviewing until the thing merges. ---- This patch treats stores from which we recently (within the suspect duration) withdrew support in StoreLiveness as suspect. Doing so ensures we don't transfer leases to such stores, as shown by the changes to TestLeaseTransferAfterStoreLivenessSupportWithdrawn. Closes #158513 Release note: None 160642: authors: add amerani to authors r=amerani a=amerani Epic: None Release note: None 160647: kvnemesis: disable SQL ops in safety mode r=stevendanna a=miraradeva Currently, there is a single kvnemesis operation that executes via SQL: `ToggleGlobalReads`. We have seen this operation get stuck and cause the test to timeout under safety more. The expected behavior is that any stuck operations would time out, but it seems like there is a context cancelation propagation issue, most likely in lib/pq, but not confirmed. This commit disables `ToggleGlobalReads` in safety mode to reduce test failures. This change would also help confirm that this is the only operation susceptible to the hanging behavior. Fixes: #160293 Release note: None 160648: kvcoord: deflake BenchmarkTxnWriteBuffer r=stevendanna a=miraradeva The `SendLocked` subtest has been consistently flaky since the test was introduced. The suspicion is that the ratio of time spent between stop/start is extremely small compared to outside the timed section. This tends to cause the benchmark not to be able to reach the 1s target, as 1s of the timed part could equal 20 minutes in total outside of the timed section. This commit removes the starting and stopping of the timer. The downside is that the benchmark now also measures the allocations as part of `makeBuffer`. But it also makes this subtest consistent with the second subtest (`flushBufferAndSendBatch`) in that respect. Fixes: #151650 Release note: None 160778: authors: add eric.alton to authors r=eric-alton a=eric-alton Epic: None Release note: None Co-authored-by: Arul Ajmani <arulajmani@gmail.com> Co-authored-by: Alek Merani <alek.merani@cockroachlabs.com> Co-authored-by: Mira Radeva <mira@cockroachlabs.com> Co-authored-by: eric-alton <eric.alton@cockroachlabs.com>
|
Build failed (retrying...): |
|
Build succeeded: |
|
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #160293: branch-release-25.4, branch-release-26.1. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. 💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details. error creating merge commit from 4eace13 to blathers/backport-release-25.4-160647: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 25.4.x failed. See errors above. 💡 Consider backporting to the fork repo instead of the main repo. See instructions for more details. error creating merge commit from 4eace13 to blathers/backport-release-26.1-160647: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 26.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Currently, there is a single kvnemesis operation that executes via SQL:
ToggleGlobalReads. We have seen this operation get stuck and cause the test to timeout under safety more. The expected behavior is that any stuck operations would time out, but it seems like there is a context cancelation propagation issue, most likely in lib/pq, but not confirmed.This commit disables
ToggleGlobalReadsin safety mode to reduce test failures. This change would also help confirm that this is the only operation susceptible to the hanging behavior.Fixes: #160293
Release note: None